Classifying clear and conversational speech based on acoustic features
نویسندگان
چکیده
This paper reports an investigation of features relevant for classifying two speaking styles, namely, conversational speaking style and clear (e.g. hyper-articulated) speaking style. Spectral and prosodic features were automatically extracted from speech and classified using decision tree classifiers and multilayer perceptrons to achieve accuracies of about 71% and 77% respectively. More interestingly, we found that out of the 56 features only about 9 features are needed to capture the most predictive power. While perceptual studies have shown that spectral cues are more useful than prosodic features for intelligibility [1], here we find prosodic features are more important for classification.
منابع مشابه
Tonal articulatory feature for Mandarin and its application to conversational LVCSR
This paper presents our recent work on the development of a tonal Articulatory Feature (AF) for Mandarin and its application to conversational LVCSR. Motivated by the theory of Mandarin phonology, eight features for classifying the acoustic units and one feature for classifying the tone are investigated and constructed in the paper, and the AF-based tandem approach is used to improve speech rec...
متن کاملDetermining the relevance of different aspects of formant contours to intelligibility
Previous studies have shown that "clear" speech, where the speaker intentionally tries to enunciate, has better intelligibility than "conversational" speech, which is produced in regular conversation. However, conversational and clear speech vary along a number of acoustic dimensions and it is unclear what aspects of clear speech lead to better intelligibility. Previously, Kain et al. [J. Acous...
متن کاملHybridizing conversational and clear speech
“Clear” (CLR) speech is a speaking style that speakers adopt to be understood correctly in a difficult communication environment. Studies have shown that CLR speech, as opposed to “conversational” (CNV) speech, has significantly higher intelligibility in various conditions. While many differences in acoustic features have been identified, it is not known which individual feature or combinations...
متن کاملModeling speaker variability using long short-term memory networks for speech recognition
Speaker adaptation of deep neural networks (DNNs) based acoustic models is still a challenging area of research. Considering that long short-term memory (LSTM) recurrent neural networks (RNNs) have been successfully applied to many sequence prediction and sequence labeling tasks, we propose to use LSTM RNNs for modeling speaker variability in automatic speech recognition (ASR). Firstly, the LST...
متن کاملA review of research on speech intelligibility and correlations with acoustic features
This review article provides an overview of differences between conversational (or cnv) and clear (or clr) speech, for a variety of speakers, in terms of speech intelligibility, and in terms of acoustic characteristics. Researchers have studied the relationship between acoustic features and speech intelligibility by, for example, studying correlations. However, the question “which acoustic feat...
متن کامل